Search CORE

228 research outputs found

Combined Acoustic and Pronunciation Modelling for Non-Native Speech Recognition

Author: Bouselmi Ghazi
Fohr Dominique
Illina Irina
Publication venue
Publication date: 27/08/2007
Field of study

In this paper, we present several adaptation methods for non-native speech recognition. We have tested pronunciation modelling, MLLR and MAP non-native pronunciation adaptation and HMM models retraining on the HIWIRE foreign accented English speech database. The ``phonetic confusion'' scheme we have developed consists in associating to each spoken phone several sequences of confused phones. In our experiments, we have used different combinations of acoustic models representing the canonical and the foreign pronunciations: spoken and native models, models adapted to the non-native accent with MAP and MLLR. The joint use of pronunciation modelling and acoustic adaptation led to further improvements in recognition accuracy. The best combination of the above mentioned techniques resulted in a relative word error reduction ranging from 46% to 71%

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

DNN-Based Semantic Model for Rescoring N-best Speech Recognition List

Author: Fohr Dominique
Illina Irina
Publication venue
Publication date: 02/11/2020
Field of study

The word error rate (WER) of an automatic speech recognition (ASR) system increases when a mismatch occurs between the training and the testing conditions due to the noise, etc. In this case, the acoustic information can be less reliable. This work aims to improve ASR by modeling long-term semantic relations to compensate for distorted acoustic features. We propose to perform this through rescoring of the ASR N-best hypotheses list. To achieve this, we train a deep neural network (DNN). Our DNN rescoring model is aimed at selecting hypotheses that have better semantic consistency and therefore lower WER. We investigate two types of representations as part of input features to our DNN model: static word embeddings (from word2vec) and dynamic contextual embeddings (from BERT). Acoustic and linguistic features are also included. We perform experiments on the publicly available dataset TED-LIUM mixed with real noise. The proposed rescoring approaches give significant improvement of the WER over the ASR system without rescoring models in two noisy conditions and with n-gram and RNNLM

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Fast Channel and Noise Compensation in the Spectral Domain

Author: Cerisara Christophe
Fohr Dominique
Publication venue: HAL CCSD
Publication date: 01/01/2002
Field of study

Colloque avec actes et comité de lecture. internationale.International audienceWe compare in this work several methods for fast adaptation of speech models to convolutional and additive noise. The tested algorithms are Parallel Model Combination (PMC), Cepstral Mean Subtraction (CMS), and an algorithm that combines PMC and CMS in the spectral domain. Experiments are realized on a natural numbers recognition task in French. We have trained the acoustic models on the SPEECHDAT database (recorded through telephone lines), and we have tested the system on the VODIS database (recorded in three different cars)

ZENODO

INRIA a CCSD electronic archive server

Detection of Phone Boundaries for Non-Native Speech using French-German Models

Author: Fohr Dominique
Mella Odile
Publication venue: HAL CCSD
Publication date: 01/09/2015
Field of study

International audienceWithin the framework of computer assisted foreign language learning for the French/German pair, we evaluate different HMM phone models for detecting accurate phone boundaries. The optimal parameters are determined by minimizing on the non-native speech corpus the number of phones whose boundaries are shifted by more than 20 ms compared to the manual boundaries. We observe that the best performance was obtained by combining a French native HMM model with an automatically selected German native HMM model

INRIA a CCSD electronic archive server

Domain Classification-based Source-specific Term Penalization for Domain Adaptation in Hate-speech Detection

Author: Aletras Nikolaos
Bose Tulika
Fohr Dominique
Illina Irina
Publication venue
Publication date: 18/09/2022
Field of study

State-of-the-art approaches for hate-speech detection usually exhibit poor performance in out-of-domain settings. This occurs, typically, due to classifiers overemphasizing source-specific information that negatively impacts its domain invariance. Prior work has attempted to penalize terms related to hate-speech from manually curated lists using feature attribution methods, which quantify the importance assigned to input terms by the classifier when making a prediction. We, instead, propose a domain adaptation approach that automatically extracts and penalizes source-specific terms using a domain classifier, which learns to differentiate between domains, and feature-attribution scores for hate-speech classes, yielding consistent improvements in cross-domain evaluation.Comment: COLING 2022 pre-prin

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Semi-automatic phonetic labelling of large corpora

Author: Fohr Dominique
Mella Odile
Publication venue: HAL CCSD
Publication date: 22/09/1997
Field of study

International audienceThe aim of the present paper is to present a methodology to semi-automatically label large corpora. This methodology is based on three main points: using several concurrent automatic stochastic labellers, decomposing the labelling of the whole corpus into an iterative refining process and building a labelling comparison procedure which takes into account phonologic and acoustic-phonetic rules to evaluate the similarity of the various labelling of one sentence. After having detailed these three points, we describe our HMM-based labelling tool and we describe the application of that methodology to the Swiss French POLYPHON database

INRIA a CCSD electronic archive server

RNN Language Model Estimation for Out-of-Vocabulary Words

Author: Fohr Dominique
Illina Irina
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

International audienceOne important issue of speech recognition systems is Out-of Vocabulary words (OOV). These words, often proper nouns or new words, are essential for documents to be transcribed correctly. Thus, they must be integrated in the language model (LM) and the lexicon of the speech recognition system. This article proposes new approaches to OOV proper noun probability estimation using Recurrent Neural Network Language Model (RNNLM). The proposed approaches are based on the notion of closest in-vocabulary (IV) words (list of brothers) to a given OOV proper noun. The probabilities of these words are used to estimate the probabilities of OOV proper nouns thanks to RNNLM. Three methods for retrieving the relevant list of brothers are studied. The main advantages of the proposed approaches are that the RNNLM is not retrained and the architecture of the RNNLM is kept intact. Experiments on real text data from the website of the Euronews channel show relative perplexity reductions of about 14% compared to baseline RNNLM

Crossref

INRIA a CCSD electronic archive server

Out-of-Vocabulary Word Probability Estimation using RNN Language Model

Author: Fohr Dominique
Illina Irina
Publication venue: HAL CCSD
Publication date: 17/11/2017
Field of study

International audienceOne important issue of speech recognition systems is Out-of Vocabulary words (OOV). These words, often proper nouns or new words, are essential for documents to be transcribed correctly. Thus, they must be integrated in the language model (LM) and the lexicon of the speech recognition system. This article proposes new approaches to OOV proper noun estimation using Recurrent Neural Network Language Model (RNNLM). The proposed approaches are based on the notion of closest in-vocabulary (IV) words (list of brothers) to a given OOV proper noun. The probabilities of these words are used to estimate the probabilities of OOV proper nouns thanks to RNNLM. Three methods for retrieving the relevant list of brothers are studied. The main advantages of the proposed approaches are that the RNNLM is not retrained and the architecture of the RNNLM is kept intact. Experiments on real text data from the website of the Euronews channel show perplexity reductions of about 14% relative compared to baseline RNNLM

INRIA a CCSD electronic archive server